Project 1 - DualLens Analytics¶
Background Story¶
In the rapidly evolving world of finance and technology, investors are constantly seeking ways to make smarter decisions by combining traditional financial analysis with emerging technological insights. While stock market trends provide a numerical perspective on growth, an organization’s initiatives in cutting-edge fields like Artificial Intelligence (AI) reveal its future readiness and innovation potential. However, analyzing both dimensions - quantitative financial performance and qualitative AI initiatives - requires sifting through multiple, diverse data sources: stock data from platforms like Yahoo Finance, reports in PDFs, and contextual reasoning using Large Language Models (LLMs).
This is where DualLens Analytics comes in. By applying a dual-lens approach, the project leverages Retrieval-Augmented Generation (RAG) to merge financial growth data with strategic insights from organizational reports. Stock data provides evidence of stability and momentum, while AI initiative documents reveal forward-looking innovation. Together, they form a richer, more holistic picture of organizational potential.
With DualLens Analytics, investors no longer need to choose between numbers and narratives—they gain a unified, AI-driven perspective that ranks organizations by both financial strength and innovation readiness, enabling smarter, future-focused investment strategies.
Problem Statement¶
Traditional investment analysis often focuses on financial metrics alone (e.g., stock growth, revenue, market cap), missing the qualitative dimension of how prepared a company is for the future. On the other hand, qualitative documents like strategy PDFs contain valuable insights about innovation and AI initiatives, but they are difficult to structure, query, and integrate with numeric financial data.
This leads to three core challenges:
Fragmented Data Sources: Financial data (stock prices) and strategic insights (PDFs) exist in silos.
Limited Analytical Scope: Manual analysis of growth trends and PDF reports is time-consuming and error-prone.
Decisional Blind Spots: Without integrating both quantitative (growth trends) and qualitative (AI initiatives) signals, investors may miss out on high-potential organizations.
Solution Approach¶
To address this challenge, we set out to build a Retrieval-Augmented Generation (RAG) powered system that blends financial trends with AI-related strategic insights, helping investors rank organizations based on growth trajectory and innovation capacity.
NOTE
You need to look for "--- --- ---" and add your code over there, this is a placeholder.
Setting up Installations and Imports¶
from google.colab import drive
drive.mount('/content/drive')
# @title Run this cell => Restart the session => Start executing the below cells **(DO NOT EXECUTE THIS CELL AGAIN)**
# !pip install langchain==0.3.25 \
# langchain-core==0.3.65 \
# langchain-openai==0.3.24 \
# chromadb==0.6.3 \
# langchain-community==0.3.20 \
# pypdf==5.4.0
# Standard Library Imports
import os
import time
import json
import re
from datetime import date, timedelta
import math
from typing import List, Dict, Any, Optional
import textwrap
# Data Handling / Analysis
import pandas as pd
import numpy as np
# Plotting / Visualization
import matplotlib.pyplot as plt
from matplotlib.ticker import FuncFormatter
import plotly.express as px
import plotly.graph_objects as go
import plotly.subplots as sp
# Financial Data Retrieval
import yfinance as yf
# Language Chain (Llama, PDF loading, etc.)
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_community.document_loaders import PyPDFDirectoryLoader, PyPDFLoader
from langchain_community.vectorstores import Chroma
from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate
1. Organization Selection¶
Selecting the below five organizations as the analysis pool.
companies = ["GOOGL", "MSFT", "IBM", "NVDA", "AMZN"]
years = 3
end = date.today()
start = end - timedelta(days=365*years)
2. Setting up LLM - 1 Marks¶
- The
config.jsonfile should contain API_KEY and API BASE URL provided by OpenAI. - You need to insert your actual API keys and endpoint URL obtained from your Olympus account. Refer to the OpenAI Access Token documentation for more information on how to generate and manage your API keys.
- This code reads the
config.jsonfile and extracts the API details.- The
API_KEYis a unique secret key that authorizes your requests to OpenAI's API. - The
OPENAI_API_BASEis the API BASE URL where the model will process your requests.
- The
What To Do?
Use the sample
config.jsonfile provided.Add their OpenAI API Key and Base URL to the file.
The
config.jsonshould look like this:{ "API_KEY": "your_openai_api_key_here", "OPENAI_API_BASE": "https://your_openai_api_base/v1" }
#Loading the `config.json` file
import json
import os
# Load the JSON file and extract values
file_name = "config.json"
with open(file_name, 'r') as file:
config = json.load(file)
os.environ['OPENAI_API_KEY'] = config['API_KEY'] # Loading the API Key
os.environ["OPENAI_BASE_URL"] = config['OPENAI_API_BASE'] # Loading the API Base Url
print(f"OPENAI_BASE_URL: {os.environ['OPENAI_BASE_URL']}")
OPENAI_BASE_URL: https://aibe.mygreatlearning.com/openai/v1
3. Visualization and Insight Extraction - 5 Marks¶
Stock¶
- Stock: are a type of security that gives stockholders a share of ownership in a company.
Financial Metrics¶
- Market Cap: Total market value of a company’s outstanding shares.
- P/E Ratio: Shows how much investors are willing to pay per dollar of earnings.
- Dividend Yield: Annual dividend income as a percentage of the stock price.
- Beta: Measures a stock’s volatility relative to the overall market.
- Total Revenue: The total income a company generates from its business operations.
- Loop through each company to retrieve stock data of the last three years using the YFinance library.
- Plot the closing prices for each company.
Metric Task
Stock Task
- Loop through all the companies to fetch data based on the specified financial metrics.
- Create a DataFrame (DF) from the collected data.
- Visualize and compare each financial metric across all companies.
- For example, visualize and compare the market capitalization for each company.
Tip: Check ticker.info for the available financial metrics
Finanical Visual Analysis¶
In the context of creating financial visuals like dashboards or charts, functions such as get_stock_data, get_financial_metrics_yf, build_metrics_dataframe, and plot_stock_trends and plot_financial_comparison_with_table are essential tools. These functions are used for pulling financial data and creating visualizations. They encapsulate specific data processing tasks, organizing complex calculations into manageable, reusable components for clearer code. This ensures consistency, reduces errors, and allows for efficient updates and adjustments to visual displays. Furthermore, these functions are designed to be called within the Retrieval-Augmented Generation (RAG) prompt, allowing the system to seamlessly integrate the financial data into the LLM-based analysis and recommendations.
1. Function for Stock data¶
def get_stock_data(companies: List[str], start_date: date, end_date: date, interval: str = "1d") -> Dict[str, pd.DataFrame]:
"""
Fetches historical stock data for a list of companies.
Args:
companies: A list of ticker symbols.
start_date: The start date for the historical data.
end_date: The end date for the historical data.
interval: The data interval (e.g., "1d", "1wk", "1mo").
Returns:
A dictionary where keys are ticker symbols and values are pandas DataFrames
containing the historical stock data.
"""
stock_data = {}
for symbol in companies:
data = yf.Ticker(symbol).history(start=start_date, end=end_date, interval=interval)
if data.empty:
print(f"[WARN] No data for {symbol}")
continue
stock_data[symbol] = data
return stock_data
# Example usage with the existing variables
companies = ["GOOGL", "MSFT", "IBM", "NVDA", "AMZN"]
years = 3
end = date.today()
start = end - timedelta(days=365 * years)
all_stock_data = get_stock_data(companies, start, end)
2. Function for Financial Metrics¶
def _to_float(x) -> Optional[float]:
try:
if x is None or (isinstance(x, float) and math.isnan(x)):
return None
return float(x)
except Exception:
return None
def _valid_ticker(sym: str) -> bool:
return isinstance(sym, str) and sym.strip() != ""
def _df_get_anyindex(df: pd.DataFrame, names: List[str], col) -> Optional[float]:
"""Try multiple row labels in a df (Yahoo can vary between 'Total Revenue' vs 'TotalRevenue')."""
if df is None or df.empty:
return None
idx = df.index.astype(str).str.lower()
for name in names:
name_l = name.lower()
if name_l in idx.values:
try:
return _to_float(df.loc[idx == name_l, col].values[0])
except Exception:
pass
return None
def _latest_column(df: pd.DataFrame):
"""Return most recent (leftmost) column label for Yahoo financials frames."""
if df is None or df.empty:
return None
# yfinance often has columns sorted newest->oldest; take the first
return df.columns[0]
def get_financial_metrics_yf(symbol: str) -> Dict[str, Optional[float]]:
"""
Return a dict with:
- market_cap_bil (USD, billions)
- pe_ratio (trailing, fallback forward)
- dividend_yield (decimal, e.g., 0.012 = 1.2%)
- beta
- total_revenue_bil (USD, billions)
Uses yfinance fundamentals (no historical price download).
Falls back to statements if some fields missing.
Values may be None if not derivable.
"""
sym = symbol.strip().upper()
if not _valid_ticker(sym):
return {
"market_cap_bil": None, "pe_ratio": None,
"dividend_yield": None, "beta": None, "total_revenue_bil": None,
}
t = yf.Ticker(sym)
market_cap = pe_ratio = dividend_yield = beta = None
total_revenue = debt_to_equity = profit_margins = None
# ---------- 1) Try .fast_info first (fast & lightweight) ----------
try:
fi = getattr(t, "fast_info", None)
if fi:
market_cap = _to_float(getattr(fi, "market_cap", None) or fi.get("market_cap") if isinstance(fi, dict) else None)
# fast_info sometimes lacks others; we primarily use it for market_cap
except Exception:
pass
# ---------- 2) Try .info (richer but can be slow) ----------
info = {}
try:
info = t.info or {}
except Exception:
info = {}
# Market cap (fallback if not in fast_info)
if market_cap is None:
market_cap = _to_float(info.get("marketCap"))
# P/E (prefer trailing, else forward)
pe_ratio = _to_float(info.get("trailingPE")) or _to_float(info.get("forwardPE"))
# Dividend yield (decimal)
dividend_yield = _to_float(info.get("dividendYield"))
if dividend_yield is None:
# Some tickers provide trailingAnnualDividendRate and currentPrice; compute yield = div / price
try:
div_rate = _to_float(info.get("trailingAnnualDividendRate"))
price = _to_float(info.get("currentPrice"))
if div_rate is not None and price and price > 0:
dividend_yield = div_rate / price
except Exception:
pass
# Beta
beta = _to_float(info.get("beta")) or _to_float(info.get("beta5Year"))
# ---------- 3) Fallbacks from statements ----------
# Annual income statement for total revenue & net income (for margins computation if missing)
income = None
try:
income = t.financials # annual
except Exception:
income = None
# Annual balance sheet for D/E and P/B fallback
bs = None
try:
bs = t.balance_sheet # annual
except Exception:
bs = None
# Total revenue (latest column)
if total_revenue is None:
try:
col = _latest_column(income)
total_revenue = _df_get_anyindex(income, ["Total Revenue", "TotalRevenue"], col)
except Exception:
pass
# Profit margins fallback: net_income / total_revenue
if profit_margins is None:
try:
col = _latest_column(income)
net_income = _df_get_anyindex(income, ["Net Income", "NetIncome"], col)
if total_revenue is None:
total_revenue = _df_get_anyindex(income, ["Total Revenue", "TotalRevenue"], col)
if net_income is not None and total_revenue and total_revenue != 0:
profit_margins = float(net_income) / float(total_revenue)
except Exception:
pass
# ---------- 4) Normalize units ----------
market_cap_bil = (market_cap / 1e9) if market_cap is not None else None
total_revenue_bil = (total_revenue / 1e9) if total_revenue is not None else None
return {
"market_cap": market_cap_bil,
"pe_ratio": pe_ratio,
"dividend_yield": dividend_yield, # decimal (e.g., 0.012 = 1.2%)
"beta": beta,
"total_revenue": total_revenue_bil
}
3. Function for Visualization¶
def plot_stock_trends(all_stock_data, companies, start_date, end_date):
"""
Plots the closing price trends for a list of companies.
Args:
all_stock_data: A dictionary with company symbols as keys and DataFrames as values.
companies: A list of ticker symbols.
start_date: The start date for the plot title.
end_date: The end date for the plot title.
"""
plt.figure(figsize=(14, 7))
for symbol in companies:
data = all_stock_data.get(symbol) # Get the DataFrame for the current symbol
if data is not None and not data.empty: # Check if data exists and is not empty
plt.plot(data.index, data["Close"], label=symbol)
else:
print(f"[WARN] No data available to plot for {symbol}")
plt.title(f"Stock Price Trends ({start_date} → {end_date})")
plt.xlabel("Date")
plt.ylabel("Price (USD)")
plt.legend()
plt.grid(True)
plt.show()
# Call the function with the existing variables
plot_stock_trends(all_stock_data, companies, start, end)
# Convert to DataFrame
def build_metrics_dataframe_yf(tickers: List[str]) -> pd.DataFrame:
rows = []
for sym in tickers:
m = get_financial_metrics_yf(sym)
rows.append({"ticker": sym, **m})
df = pd.DataFrame(rows)
return df
fin_df = build_metrics_dataframe_yf(companies)
fin_df_fmt = fin_df.copy()
for col, op in [
("market_cap", lambda s: s / 1e9),
("total_revenue", lambda s: s / 1e9),
("dividend_yield", lambda s: s * 100),
]:
fin_df_fmt[col] = op(pd.to_numeric(fin_df_fmt[col], errors="coerce"))
def plot_financial_comparison_with_table(df, metrics=['market_cap', 'pe_ratio', 'dividend_yield', 'beta', 'total_revenue']):
num_metrics = len(metrics)
# Map metrics to readable titles
titles = {
'market_cap': 'Market Cap (Bil USD)',
'pe_ratio': 'P/E Ratio',
'dividend_yield': 'Dividend Yield (%)',
'beta': 'Beta',
'total_revenue': 'Total Revenue (Bil USD)'
}
# Create subplots (2 rows, 3 columns for 5 metrics)
fig = sp.make_subplots(rows=2, cols=3,
subplot_titles=[titles[m] for m in metrics],
vertical_spacing=0.2)
# Add bar plots for each metric
for i, metric in enumerate(metrics):
row = (i // 3) + 1
col = (i % 3) + 1
fig.add_trace(
go.Bar(x=df['ticker'], y=df[metric], name=titles[metric],
marker_color='indianred'),
row=row, col=col
)
# Update layout for size and title
fig.update_layout(
height=800, # Larger height for better readability
width=1200,
title_text='Financial Metrics Comparison',
showlegend=False,
margin=dict(t=100) # space at top for title
)
# Create a DataFrame display as a table
table = go.Figure(data=[go.Table(
header=dict(values=["Company"] + [titles[m] for m in metrics],
fill_color='paleturquoise',
align='left'),
cells=dict(values=[
df['ticker'],
*[df[m] for m in metrics]
],
fill_color='lavender',
align='left'))
])
table.update_layout(
height=300,
margin=dict(t=20)
)
# Show plots and table
fig.show()
table.show()
# Usage:
plot_financial_comparison_with_table(fin_df)
Step 1 — Get Price History (5 years) and Build stocks_dict¶
# Example ticker universe
companies = ["GOOGL", "MSFT", "IBM", "NVDA", "AMZN"]
# Fetch 5-year price history for each ticker
start = date(2020, 1, 1)
end = date(2025, 1, 1)
stocks_dict = get_stock_data(
companies,
start_date=start,
end_date=end,
interval="1d"
)
# Now stocks_dict looks like:
# {
# "GOOGL": <DataFrame of OHLCV over 5 years>,
# "MSFT": <DataFrame of OHLCV over 5 years>,
# "IBM": <DataFrame of OHLCV over 5 years>
# }
print(type(stocks_dict))
print(stocks_dict.keys())
<class 'dict'> dict_keys(['GOOGL', 'MSFT', 'IBM', 'NVDA', 'AMZN'])
Step 2 — Build Metrics DataFrame¶
fin_df = build_metrics_dataframe_yf(companies)
print(fin_df)
ticker market_cap pe_ratio dividend_yield beta total_revenue 0 GOOGL 3146.676437 27.710022 0.32 1.000 350.018 1 MSFT 3892.038861 38.472446 0.70 1.023 281.724 2 IBM 287.393677 36.646004 2.19 0.724 62.753 3 NVDA 4534.872048 53.217140 0.02 2.123 130.497 4 AMZN 2391.180050 34.126330 0.00 1.281 637.959
4. RAG-Driven Analysis - 7 Marks¶
Performing the RAG-Driven Analysis on the AI Initiatives of the companies
Your Task
- Extract all PDF files from the provided ZIP file.
- Read the content from each PDF file.
- Split the content into manageable chunks.
- Store the chunks in a vector database using embedding functions.
- Implement a query mechanism on the vector database to retrieve results based on user queries regarding AI initiatives.
- Evaluate the LLM generated response using LLM-as-Judge
1A. Loading Company AI Initiative Documents (PDFs) - 1 mark¶
# Unzipping the AI Initiatives Documents
import zipfile
with zipfile.ZipFile("/content/Companies-AI-Initiatives.zip", 'r') as zip_ref:
zip_ref.extractall("/content/") # Storing all the unzipped contents in this location
Read the content from each PDF files¶
# Path of all AI Initiative Documents
ai_initiative_pdf_paths = [f"/content/Companies-AI-Initiatives/{file}" for file in os.listdir("/content/Companies-AI-Initiatives")]
ai_initiative_pdf_paths
['/content/Companies-AI-Initiatives/AMZN.pdf', '/content/Companies-AI-Initiatives/MSFT.pdf', '/content/Companies-AI-Initiatives/GOOGL.pdf', '/content/Companies-AI-Initiatives/IBM.pdf', '/content/Companies-AI-Initiatives/NVDA.pdf']
from langchain_community.document_loaders import PyPDFDirectoryLoader
loader = PyPDFDirectoryLoader(path = "/content/Companies-AI-Initiatives/") # Creating an PDF loader object
Split the chunks PDF¶
# Defining the text splitter
# 3) Token-based recursive split (tiktoken)
text_splitter = RecursiveCharacterTextSplitter.from_tiktoken_encoder(
encoding_name="cl100k_base",
chunk_size=800, # good default
chunk_overlap=120, # ~15% overlap
separators=[ # encourages clean boundaries
"\n\n", "\n", ". ", " ", ""
],
add_start_index=True,
keep_separator=False
)
# Splitting the chunks using the text splitter
ai_initiative_chunks = loader.load_and_split(text_splitter)
# Total length of all the chunks
len(ai_initiative_chunks)
76
1B. Vectorizing AI Initiative Documents with ChromaDB - 1 mark¶
# Defining the 'text-embedding-ada-002' as the embedding model
from langchain_openai import OpenAIEmbeddings
embedding_model = OpenAIEmbeddings(model='text-embedding-ada-002')
# Creating a Vectorstore, storing all the above created chunks using an embedding model
vectorstore = Chroma.from_documents(
ai_initiative_chunks,
embedding_model,
collection_name="AI_Initiatives"
)
# Ignore if it gives an error or warning
ERROR:chromadb.telemetry.product.posthog:Failed to send telemetry event ClientStartEvent: capture() takes 1 positional argument but 3 were given ERROR:chromadb.telemetry.product.posthog:Failed to send telemetry event ClientCreateCollectionEvent: capture() takes 1 positional argument but 3 were given
# Creating an retriever object which can fetch ten similar results from the vectorstore
retriever = vectorstore.as_retriever(
search_type="similarity",
search_kwargs={ 'k': 6}
)
1C. Retrieving relevant Documents - 3 marks¶
user_message = "Give me the best project that `IBM` company is working upon"
# Building the context for the query using the retrieved chunks
relevant_document_chunks = retriever.invoke(user_message)
context_list = [d.page_content for d in relevant_document_chunks]
context_for_query = ". ".join(context_list)
ERROR:chromadb.telemetry.product.posthog:Failed to send telemetry event CollectionQueryEvent: capture() takes 1 positional argument but 3 were given
len(relevant_document_chunks)
6
# Write a system message for an LLM to help craft a response from the provided context
qna_qual_system_message = """You are an expert in finance and technology. Your task is to answer questions about companies'
AI initiatives based on the provided context documents.
This context will begin with the token: ###Context.
The context contains references to specific portions of a document relevant to the user query.
Use the information in the context to provide detailed and relevant answers.
If the context does not contain the answer, state that you cannot answer based on the provided information. Do not make up information!
"""
# Write an user message template which can be used to attach the context and the questions
qna_qual_user_message_template = """
###Context
Below are relevant document excerpts (e.g., annual reports, AI strategy PDFs, R&D notes, or press coverage):
{context}
###Question
Use the provided context to answer the following question about companies' AI initiatives.
{question}
"""
# Format the prompt
formatted_prompt = f"""[INST]{qna_qual_system_message}\n
{'user'}: {qna_qual_user_message_template.format(context=context_for_query, question=user_message)}
[/INST]"""
# This is the prompt, we are feeding to the LLM
# print(formatted_prompt)
from langchain_openai import ChatOpenAI
llm_qual = ChatOpenAI(
model="gpt-4o-mini",
temperature=0.3,
max_tokens=3000,
top_p=0.95,
frequency_penalty=1.2,
stop_sequences=['INST']
)
2.1 Define RAG Function for response¶
# Make the LLM call
resp = llm_qual.invoke(formatted_prompt)
resp.content
"Based on the provided context, one of the best projects that IBM is currently working on is **IBM Granite**. Introduced in September 2023, Granite is a series of open-source, high-performance AI foundation models designed to empower enterprise applications across various industries. The models are efficient, customizable, and scalable, allowing businesses to integrate advanced AI capabilities into their workflows while maintaining control over their data.\n\nGranite models are optimized for enterprise use and trained on diverse datasets including internet content and domain-specific documents. They support IBM's broader goals of advancing AI accessibility and responsible innovation while fostering collaboration within the AI community.\n\nAdditionally, Granite integrates seamlessly with IBM’s Watsonx platform, enabling organizations to deploy and manage AI applications efficiently at scale. This initiative aligns with IBM's strategic focus on providing enterprise-grade AI solutions that enhance decision-making processes and drive innovation across sectors such as finance and healthcare.\n\nOverall, IBM Granite stands out as a significant project due to its potential impact on improving operational efficiency for enterprises through advanced artificial intelligence technologies."
# Define RAG function
def RAG_qualitative(user_message):
"""
Args:
user_message: Takes a user input for which the response should be retrieved from the vectorDB.
Returns:
relevant context as per user query.
"""
relevant_document_chunks = retriever.invoke(user_message)
context_list = [d.page_content for d in relevant_document_chunks]
context_for_query = ". ".join(context_list)
# Combine qna_system_message and qna_user_message_template to create the prompt
prompt = f"""[INST]{qna_qual_system_message}\n
{'user'}: {qna_qual_user_message_template.format(context=context_for_query, question=user_message)}
[/INST]"""
# Quering the LLM
try:
response = llm_qual.invoke(prompt)
except Exception as e:
response = f'Sorry, I encountered the following error: \n {e}'
return response.content
print(RAG_qualitative("How is the area in which GOOGL is working different from the area in which MSFT is working?"))
Based on the provided context, Google (GOOGL) and Microsoft (MSFT) are both heavily invested in artificial intelligence initiatives, but they focus on different areas and applications within the AI landscape. 1. **Google's Focus**: - Google is advancing its AI capabilities primarily through its Gemini initiative, which encompasses multimodal foundation models that integrate reasoning, coding, and generative capabilities across text, images, and code. This initiative aims to enhance user experience across various consumer products like Google Search, Gmail, Google Assistant, and enterprise solutions via Google Cloud’s Vertex AI. - Google's efforts emphasize natural language processing (NLP), computer vision, speech recognition, generative AI for consumer engagement metrics in tools like Chrome and Bard. Additionally, it focuses on integrating these advanced models into existing services to improve functionalities while maintaining a strong emphasis on responsible AI development. 2. **Microsoft's Focus**: - Microsoft has aggressively expanded its AI capabilities through Azure AI and partnerships with OpenAI. Its initiatives include proprietary models like Copilot that enhance productivity specifically within Microsoft 365 apps such as Word and Excel. - The company’s strategy revolves around embedding AI across both consumer applications (like Teams) and enterprise solutions via Azure to streamline workflows and drive digital transformation. Microsoft's approach also includes experimental platforms like Azure AI Foundry Labs aimed at translating advanced research into real-world applications. In summary: - **Google** is focused more on developing integrated multimodal systems that enhance user experiences across a wide range of products while emphasizing ethical deployment. - **Microsoft**, conversely, is concentrating on enhancing productivity tools through specific integrations of generative models within its software ecosystem aimed at business efficiency. These distinctions highlight how each company approaches their respective markets with unique strategies tailored to their product offerings—Google leaning towards broad integration of advanced features in consumer-facing products versus Microsoft's targeted enhancements in productivity-focused environments.
print(RAG_qualitative("What are the three projects on which MSFT is working upon?"))
Based on the provided context, Microsoft is working on the following three AI initiatives: 1. **Azure AI Foundry Labs**: This is an experimental AI platform designed to accelerate the translation of advanced AI research into real-world applications. It serves as a collaborative hub for developers, startups, enterprises, and Microsoft Research teams. 2. **Microsoft 365 Copilot**: An AI-powered productivity assistant integrated across Microsoft 365 applications (such as Word, Excel, PowerPoint, Outlook, and Teams). It utilizes large language models to provide contextual assistance in drafting content and automating tasks. 3. **GitHub Copilot**: This initiative provides AI-driven coding support within development environments. It aims to improve developer productivity by offering advanced features for coding assistance. These projects reflect Microsoft's commitment to embedding AI capabilities across its products and services while enhancing productivity and innovation for users.
print(RAG_qualitative("What is the timeline of each project in NVDA?"))
Based on the provided context, here is the timeline for NVIDIA's AI initiatives: 1. **Project G-Assist**: - **Concept & Demo Phase**: Early prototypes of G-Assist were teased in NVIDIA showcases tied to RTX AI initiatives. - **Public Availability**: G-Assist became accessible via the NVIDIA App in 2024–2025, marking its first interaction with consumers at scale. - **Iterative Updates**: Throughout 2024 and 2025, NVIDIA improved memory efficiency, broadened GPU compatibility, and launched plugin SDKs. Developer hackathons were introduced during this period to spur ecosystem growth. 2. **DLSS 4 (Deep Learning Super Sampling)**: - **Development Timeline (2024–2025)**: DLSS 4 was refined for frame generation and expanded integration with Reflex and advanced motion prediction. - **Status as of 2025**: DLSS 4 is fully available and integrated into many new AAA titles. It is actively promoted in NVIDIA’s Game Ready drivers. Overall, these timelines reflect ongoing development efforts within NVIDIA's consumer software division aimed at enhancing user experience through AI capabilities.
print(RAG_qualitative("What are the areas in which AMZN is investing when it comes to AI?"))
Based on the provided context, Amazon (AMZN) is investing in several key areas related to AI: 1. **Retail Enhancements**: Amazon utilizes AI for product recommendations, dynamic pricing, fraud detection, and supply chain optimization to improve the shopping experience. 2. **Amazon Web Services (AWS)**: AWS offers a range of AI and machine learning tools that assist businesses in building intelligent applications. This includes services like Amazon SageMaker, which simplifies the process of building and deploying machine learning models. 3. **Generative AI Initiatives**: The company is developing advanced generative AI capabilities through projects like Amazon Bedrock and Olympus: - **Amazon Bedrock** provides access to foundation models from leading AI companies for generative applications. - **Olympus**, a multimodal AI model launched in 2023, can process text, images, and videos simultaneously to enhance search functionalities across platforms. 4. **Voice Technology**: Innovations such as Alexa demonstrate investment in voice recognition technology that enhances user interaction with devices. 5. **Robotics**: Investments are also made in robotics within warehouses to streamline order fulfillment processes. 6. **User Experience Improvements**: By enhancing search capabilities through initiatives like Olympus, Amazon aims to improve user engagement and satisfaction while reducing reliance on external providers for its AI needs. Overall, these investments reflect a comprehensive strategy aimed at leveraging AI across various aspects of its business operations while maintaining competitive advantages within the market.
print(RAG_qualitative("What are the risks associated with projects within GOOG?"))
The risks associated with projects within Google (GOOG) as outlined in the provided context include: 1. **Privacy Concerns**: Processing live video and audio data raises significant privacy issues, necessitating robust data protection measures. 2. **Technical Hurdles**: Achieving real-time, accurate multimodal understanding requires overcoming complex AI and hardware challenges. 3. **User Acceptance**: Gaining user trust and acceptance for a new form of AI assistant that interacts in more personal and potentially intrusive ways is crucial. 4. **Regulatory Compliance**: Navigating the evolving landscape of AI regulations and ensuring compliance with global standards poses a challenge. 5. **Governance & Compliance**: Enterprises must navigate intellectual property (IP), copyright, and regulatory obligations for generative media when using Google's offerings. 6. **Security & Privacy**: Sensitive enterprise data requires robust security controls to protect against breaches or misuse. 7. **Compute Costs**: Generative workloads are resource-intensive, requiring careful cost monitoring to manage expenses effectively. 8. **Model Safety**: Risks such as hallucinations (the generation of incorrect information) and factual inaccuracies require constant evaluation and moderation to ensure reliability. These risks highlight the complexities involved in developing advanced AI technologies while maintaining user trust, compliance with regulations, and effective operational management.
D. Evaluation of the RAG - 2 marks¶
# Writing a question for performing evaluations on the RAG
evaluation_test_question = "What are the three projects on which MSFT is working upon?"
# Building the context for the evaluation test question using the retrieved chunks
relevant_document_chunks = retriever.get_relevant_documents(evaluation_test_question)
context_list = [d.page_content for d in relevant_document_chunks]
context_for_query = ". ".join(context_list)
/tmp/ipython-input-2412962394.py:2: LangChainDeprecationWarning: The method `BaseRetriever.get_relevant_documents` was deprecated in langchain-core 0.1.46 and will be removed in 1.0. Use :meth:`~invoke` instead.
# Default RAG Answer
answer = RAG_qualitative(evaluation_test_question)
print(answer)
Based on the provided context, Microsoft is working on the following three AI initiatives: 1. **Azure AI Foundry Labs**: This is an experimental AI platform designed to accelerate the translation of advanced AI research into real-world applications. It serves as a collaborative hub for developers, startups, enterprises, and Microsoft Research teams to experiment with various AI models and tools. 2. **Microsoft 365 Copilot**: An AI-powered productivity assistant integrated across Microsoft 365 applications such as Word, Excel, PowerPoint, Outlook, and Teams. It utilizes large language models to provide intelligent assistance in drafting content, analyzing data, summarizing meetings, and automating repetitive tasks. 3. **GitHub Copilot**: This initiative provides powerful AI-driven coding support within development environments (IDEs), enhancing developer productivity by offering advanced features for coding assistance. These projects illustrate Microsoft's commitment to embedding artificial intelligence across its products and services to improve efficiency and drive innovation.
# Defining user messsage template for evaluation
evaluation_user_message_template = """
###Question
{question}
###Context
{context}
###Answer
{answer}
"""
1. Groundedness/Faithfulness¶
- How much of the answer is drawn from the context?
# Writing the system message and the evaluation metrics for checking the groundedness
groundedness_rater_system_message = """
You are tasked with rating AI generated answers to questions posed by users.
You will be presented a question, context used by the AI system to generate the answer and an AI generated answer to the question.
In the input, the question will begin with ###Question, the context will begin with ###Context while the AI generated answer will begin with ###Answer.
Evaluation criteria:
The task is to judge the extent to which the metric is followed by the answer.
1 - The metric is not followed at all
2 - The metric is followed only to a limited extent
3 - The metric is followed to a good extent
4 - The metric is followed mostly
5 - The metric is followed completely
Metric:
The answer should be derived only from the information presented in the context
Instructions:
1. First write down the steps that are needed to evaluate the answer as per the metric.
2. Give a step-by-step explanation if the answer adheres to the metric considering the question and context as the input.
3. Next, evaluate the extent to which the metric is followed.
4. Use the previous information to rate the answer using the evaluaton criteria and assign a score.
"""
# Combining groundedness_rater_system_message + llm_prompt + answer for evaluation
groundedness_prompt = f"""[INST]{groundedness_rater_system_message}\n
{'user'}: {evaluation_user_message_template.format(context=context_for_query, question=evaluation_test_question, answer=answer)}
[/INST]"""
# Defining a new LLM object
groundness_checker = ChatOpenAI(
model="gpt-4o-mini",
temperature=0,
max_tokens=500,
top_p=0.95,
frequency_penalty=1.2,
stop_sequences=['INST']
)
# Using the LLM-as-Judge for evaluating Groundedness
groundness_response = groundness_checker.invoke(groundedness_prompt)
print(groundness_response.content)
### Steps to Evaluate the Answer
1. **Identify Key Information in the Context**: Extract the main projects mentioned in the context that Microsoft is working on.
2. **Compare with AI Generated Answer**: Check if all three projects listed in the answer are present and accurately described based on the context provided.
3. **Assess Completeness and Accuracy**: Determine if any additional information not found in the context has been included or if any critical details from the context have been omitted.
4. **Rate Adherence to Metric**: Based on how well these steps align with deriving information solely from the provided context, assign a score according to evaluation criteria.
### Step-by-Step Explanation of Adherence to Metric
1. **Key Information Extraction**:
- The context mentions three specific initiatives/projects:
- Azure AI Foundry Labs
- Microsoft 365 Copilot
- GitHub Copilot
2. **Comparison with AI Generated Answer**:
- The answer lists all three initiatives correctly as follows:
1. Azure AI Foundry Labs
2. Microsoft 365 Copilot
3. GitHub Copilot
3. **Completeness and Accuracy Assessment**:
- Each project is described accurately based on what was presented in the context, including their purposes and functionalities.
- No extraneous information or personal opinions were added; everything aligns directly with what was stated.
4. **Rating Adherence to Metric**:
- Since all key points were derived directly from contextual information without deviation, this indicates complete adherence to metric requirements.
### Evaluation Score
Based on this analysis, I would rate this answer as follows:
- The metric is followed completely (5).
Thus, I assign a score of ***5*** for this response as it fully adheres to deriving its content solely from provided contextual information without introducing external elements or inaccuracies.
2. Relevance¶
- How relevant the retrieval context is to the input.
# Writing the system message and the evaluation metrics for checking the relevance
relevance_rater_system_message = """
You are tasked with rating the relevance of context used by an AI system.
You will be presented with a question and the context used to answer it.
In the input, the question will begin with ###Question and the context will begin with ###Context.
Evaluation criteria:
The task is to judge the extent to which the context is relevant to the question.
1 - The context is not relevant at all
2 - The context is relevant only to a limited extent
3 - The context is relevant to a good extent
4 - The context is mostly relevant
5 - The context is completely relevant
Metric:
Context Relevance measures how well the provided context aligns with and supports answering the question.
Consider whether the context contains the necessary information to properly answer the question.
Instructions:
1. First write down the steps that are needed to evaluate the context relevance.
2. Give a step-by-step explanation of how the context relates to the question.
3. Next, evaluate the extent to which the context supports answering the question.
4. Use the previous information to rate the context using the evaluation criteria and assign a score.
"""
# Combining relevance_rater_system_message + llm_prompt + answer for evaluation
relevance_prompt = f"""[INST]{relevance_rater_system_message}\n
{'user'}: {evaluation_user_message_template.format(context=context_for_query, question=evaluation_test_question, answer=answer)}
[/INST]"""
# Defining a new LLM object
relevance_checker = ChatOpenAI(
model="gpt-4o-mini",
temperature=0,
max_tokens=500,
top_p=0.95,
frequency_penalty=1.2,
stop_sequences=['INST']
)
# Using the LLM-as-Judge for evaluating Relevance
relevance_response = relevance_checker.invoke(relevance_prompt)
print(relevance_response.content)
### Steps to Evaluate Context Relevance 1. **Identify the Question**: Understand what information is being requested in the question. 2. **Analyze the Context**: Read through the provided context to extract relevant details that may answer the question. 3. **Match Information**: Compare elements from both the question and context to see if they align or support each other. 4. **Assess Completeness**: Determine if all parts of the question are addressed by the context, including any specific details requested (e.g., names of projects). 5. **Rate Relevance**: Use a scoring system based on how well aligned and supportive the context is in answering the question. ### Step-by-Step Explanation of Context Relation 1. The question asks for "the three projects on which MSFT is working upon." 2. The provided context discusses various AI initiatives by Microsoft, specifically mentioning: - Azure AI Foundry Labs - Microsoft 365 Copilot - GitHub Copilot 3. Each initiative listed in both contexts directly corresponds with what was asked in terms of identifying specific projects. 4. The description provides sufficient detail about each project, indicating their purpose and relevance within Microsoft's broader strategy. ### Evaluation of Context Support - The context explicitly lists three distinct projects that Microsoft is currently working on related to artificial intelligence. - Each project mentioned has a brief summary explaining its function and significance, which aligns perfectly with what was asked in terms of identifying these initiatives. ### Rating Given that all three specified projects are clearly identified along with relevant descriptions supporting their importance: Score = 5 (The context is completely relevant)
2. DUALLENS: Qualitatitive and Quantitative Prompt Lens(AI)¶
LLM Deterministic for quantative synthesis¶
llm_duallens = ChatOpenAI(
model="gpt-4o-mini",
temperature=0,
max_tokens=6000,
top_p=1.0,
frequency_penalty=0
)
# --- 4) Call your Dual-Lens LLM (not the qual-only one) ----------------------
# llm_duallens = ChatOpenAI(model="gpt-4o-mini", temperature=0, max_tokens=900)
def _normalize_tickers(x):
"""Accept 'MSFT' or ['MSFT'] and return a normalized list of upper-case tickers."""
if isinstance(x, str):
return [x.strip().upper()]
return [t.strip().upper() for t in x]
def _price_cagr_annualized(df: pd.DataFrame) -> float | None:
if df is None or df.empty or "Close" not in df.columns: return None
first, last = df["Close"].iloc[0], df["Close"].iloc[-1]
if first <= 0 or last <= 0: return None
# try trading-days fallback if index isn't datetime
try:
days = (df.index[-1] - df.index[0]).days
years = max(days / 365.0, 1e-9)
except Exception:
years = max(len(df) / 252.0, 1e-9)
return (last / first) ** (1 / years) - 1
def _realized_vol_annualized(df: pd.DataFrame) -> float | None:
if df is None or df.empty or "Close" not in df.columns: return None
ret = df["Close"].pct_change().dropna()
if ret.empty: return None
return float(ret.std() * np.sqrt(252))
def build_perf_risk_table(stocks_dict: dict[str, pd.DataFrame]) -> pd.DataFrame:
rows = []
for tic, sdf in stocks_dict.items():
cagr = _price_cagr_annualized(sdf)
vol = _realized_vol_annualized(sdf)
rows.append({
"Ticker": tic,
"Price CAGR (5y) %": None if cagr is None else round(100*cagr, 2),
"Beta": None, # we’ll fill from fin_df merge if desired
"Realized Vol (ann.) %": None if vol is None else round(100*vol, 2),
})
return pd.DataFrame(rows)
def build_valuation_table(fin_df: pd.DataFrame) -> pd.DataFrame:
# Expect columns: ticker, market_cap, pe_ratio, dividend_yield, beta, total_revenue
tab = fin_df.rename(columns={
"ticker": "Ticker",
"pe_ratio": "P/E",
"market_cap": "Market Cap",
"dividend_yield": "Dividend Yield %",
"beta": "Beta",
"total_revenue": "Total Revenue",
})[["Ticker","P/E","Market Cap","Dividend Yield %","Beta","Total Revenue"]]
return tab
def make_duallens_inputs(
fin_df: pd.DataFrame,
stocks_dict: dict[str, pd.DataFrame],
qual_context: str,
start_date,
end_date,
question: str,
tickers: list[str] | str | None = None, # <— NEW (optional)
):
# 1) normalize and subset (ONE or MANY tickers supported)
if tickers:
_t = _normalize_tickers(tickers)
fin_df = fin_df[fin_df["ticker"].str.upper().isin(_t)].copy()
stocks_dict = {t: df for t, df in stocks_dict.items() if t.upper() in _t}
# 2) build valuation & perf tables from this filtered universe (your existing code)
val_df = build_valuation_table(fin_df)
pr_df = build_perf_risk_table(stocks_dict)
# (optional) map beta into perf table for visibility
beta_map = dict(zip(fin_df["ticker"], fin_df["beta"]))
if "Ticker" in pr_df.columns:
pr_df["Beta"] = pr_df["Ticker"].map(beta_map)
# 3) robust peer rankings (see next section)
peer_md = peer_rankings_md(
fin_df.rename(columns=str),
pr_df.rename(columns=str)
)
return {
"tickers": ", ".join(fin_df["ticker"].tolist()),
"start_date": start_date.isoformat(),
"end_date": end_date.isoformat(),
"valuation_table_md": val_df.to_markdown(index=False),
"perf_risk_table_md": pr_df.to_markdown(index=False),
"peer_rankings_md": peer_md,
"qual_context": qual_context,
"question": question,
}
def peer_rankings_md(fin_df_subset: pd.DataFrame, pr_df_subset: pd.DataFrame) -> str:
lines = []
n = len(fin_df_subset)
# Best Value (P/E)
if "pe_ratio" in fin_df_subset.columns:
pe_sorted = fin_df_subset[["ticker","pe_ratio"]].dropna().sort_values("pe_ratio")
if not pe_sorted.empty:
if n >= 2:
lines.append(f"- **Best Value (P/E)**: {pe_sorted.iloc[0]['ticker']} ({pe_sorted.iloc[0]['pe_ratio']:.2f})")
else:
lines.append(f"- **P/E**: {pe_sorted.iloc[0]['ticker']} ({pe_sorted.iloc[0]['pe_ratio']:.2f})")
# Risk (Beta)
if "beta" in fin_df_subset.columns:
b = fin_df_subset[["ticker","beta"]].dropna().sort_values("beta")
if not b.empty:
if n >= 2:
lines.append(f"- **Lowest Risk (Beta)**: {b.iloc[0]['ticker']} ({b.iloc[0]['beta']:.2f}); "
f"**Highest Risk**: {b.iloc[-1]['ticker']} ({b.iloc[-1]['beta']:.2f})")
else:
lines.append(f"- **Beta**: {b.iloc[0]['ticker']} ({b.iloc[0]['beta']:.2f})")
# Performance (CAGR) — adjust column name if yours differs
cagr_col = None
for col in ["Price CAGR (5y) %", "price_cagr_5y", "price_cagr_pct"]:
if col in pr_df_subset.columns:
cagr_col = col
break
if cagr_col:
c = pr_df_subset[["Ticker", cagr_col]].dropna().sort_values(cagr_col, ascending=False)
if not c.empty:
if n >= 2:
lines.append(f"- **Top Performance ({cagr_col})**: {c.iloc[0]['Ticker']} ({c.iloc[0][cagr_col]:.2f}%)")
else:
lines.append(f"- **{cagr_col}**: {c.iloc[0]['Ticker']} ({c.iloc[0][cagr_col]:.2f}%)")
return "\n".join(lines) if lines else "- No comparable rankings available."
# ---------- system + user templates ----------
qna_duallen_system_message = """
You are a Financial Research Analyst. Evaluate a company by combining:
1) Quantitative Growth Signals (ONLY from get_stock_data + build_metrics_dataframe_yf outputs)
2) Qualitative Innovation Signals (ONLY from <CONTEXT>)
Rules:
- Use only the provided data; do not invent or fetch external info.
- Cite metrics with value + horizon (e.g., “Beta 1.02”, “3-yr (ending {end_date}) CAGR 22.7%”, “Revenue $350B”).
- Keep units consistent: Market Cap & Revenue in $B, Yield in %, Volatility in % (annualized).
- If peer set < 3, avoid “best/worst” and instead use comparative phrasing (e.g., “lower beta than peer (peer set = 2)”).
- If data is missing, state it and continue with caveats.
STRICT anti-hallucination + anti-echo (critical):
- Do NOT copy sentences from <CONTEXT>.
- NEVER copy more than 10 consecutive words from <CONTEXT>.
- DO NOT introduce projects, strategies, or claims unless they are explicitly supported by <CONTEXT>.
- If a project is not supported by evidence, omit it or mark “insufficient evidence.”
QUAL Evidence Step (must run before writing narrative):
From <CONTEXT>, extract up to 3 concrete initiatives for the target ticker:
Format them as:
[E1] Project Name — 1-line paraphrase; evidence snippet ≤10 words; [source | page]
[E2] ...
[E3] ...
If fewer than 3 exist, return fewer and state “insufficient evidence.”
Qualitative Narrative Rules:
- Write the Qual Narrative ONLY using [E1]–[E3].
- Paraphrase the context; summarize in your own words.
- No new projects. No outside knowledge.
Direct Comparison Questions (e.g., “How is GOOGL different from MSFT?”):
Answer using a contrast frame based on:
(a) Domain focus (consumer/ads vs enterprise/productivity)
(b) Go-to-market (B2C vs B2B)
(c) AI posture (platforms/models vs applied integrations)
(d) Monetization vectors (ads/search vs subscriptions/cloud)
Use 2–3 contrast points in the Direct Answer.
Output Format:
- Dual-Lens Assessment
• Quantitative Narrative (2–4 sentences)
• Qualitative Narrative (3–6 sentences, based ONLY on [E1]–[E3])
- Synthesis (how innovation + fundamentals align or conflict)
- Outlook (Bull, Bear, Base, Final View)
- Direct Answer to the Question
- Tickers Analyzed
"""
companies = ["GOOGL","MSFT"]
fin_df_subset = fin_df[fin_df["ticker"].isin(companies)].reset_index(drop=True)
# If your stocks_dict was fetched earlier for a bigger universe, subset it too:
stocks_dict_subset = {t: df for t, df in stocks_dict.items() if t in companies}
# --- 1) Convert retrieved docs -> single context string ----------------------
def build_qual_context_from_docs(docs, k=6, max_chars=6000):
"""docs: list of LangChain Documents with page_content + metadata"""
chunks = []
for d in docs[:k]:
chunks.append(d.page_content)
src = d.metadata.get("source") or d.metadata.get("file_path") or d.metadata.get("pdf_file_path") or "unknown"
page = d.metadata.get("page")
label = f"[Source: {src}{'' if page is None else f' | page {page}'}]"
text = d.page_content.strip()
chunks.append(f"{label}\n{text}")
ctx = "\n\n".join(chunks)
return ctx[:max_chars]
user_message = "How is the area in which GOOGL is working different from the area in which MSFT is working?"
# Get qualitative chunks for THIS question
relevant_document_chunks = retriever.get_relevant_documents(user_message)
qual_context_raw = build_qual_context_from_docs(relevant_document_chunks)
# --- 2) Build Dual-Lens inputs (QUANT + QUAL) --------------------------------
inp = make_duallens_inputs(
fin_df=fin_df_subset, # <- filtered metrics (GOOGL, MSFT only)
stocks_dict=stocks_dict_subset, # <- filtered price history dict
qual_context=qual_context_raw, # <- fused string
start_date=start,
end_date=end,
question=user_message
)
qna_duallen_user_message_template = """###Quantitative Inputs
As-of Window: {start_date} → {end_date}
**Valuation Snapshot**
{valuation_table_md}
**Performance & Risk (derived from 5-yr price history)**
{perf_risk_table_md}
**Peer Rankings**
{peer_rankings_md}
###Question
{question}
"""
user_msg = qna_duallen_user_message_template.format(**inp)
print(f"DUALLENS Financial Investor \n{user_msg}") # <-- verify the tables + context made it in
resp = llm_duallens.invoke([
{"role": "system", "content": qna_duallen_system_message},
{"role": "user", "content": user_msg},
])
print(resp.content)
DUALLENS Financial Investor ###Quantitative Inputs As-of Window: 2020-01-01 → 2025-01-01 **Valuation Snapshot** | Ticker | P/E | Market Cap | Dividend Yield % | Beta | Total Revenue | |:---------|--------:|-------------:|-------------------:|-------:|----------------:| | GOOGL | 27.71 | 3146.68 | 0.32 | 1 | 350.018 | | MSFT | 38.4724 | 3892.04 | 0.7 | 1.023 | 281.724 | **Performance & Risk (derived from 5-yr price history)** | Ticker | Price CAGR (5y) % | Beta | Realized Vol (ann.) % | |:---------|--------------------:|-------:|------------------------:| | GOOGL | 22.66 | 1 | 32.5 | | MSFT | 22.37 | 1.023 | 30.5 | **Peer Rankings** - **Best Value (P/E)**: GOOGL (27.71) - **Lowest Risk (Beta)**: GOOGL (1.00); **Highest Risk**: MSFT (1.02) - **Top Performance (Price CAGR (5y) %)**: GOOGL (22.66%) ###Question How is the area in which GOOGL is working different from the area in which MSFT is working? ### Dual-Lens Assessment - **Quantitative Narrative**: GOOGL exhibits a P/E ratio of 27.71, indicating a more favorable valuation compared to MSFT's 38.47. With a market cap of $3,146.68B and total revenue of $350.018B, GOOGL also shows a higher 5-year price CAGR of 22.66% compared to MSFT's 22.37%. GOOGL has a lower beta of 1.00, suggesting it carries less risk than MSFT, which has a beta of 1.023. - **Qualitative Narrative**: [E1] Project Name — GOOGL is enhancing its AI capabilities; evidence snippet: “expanding AI tools” [source | page]. [E2] Project Name — GOOGL is focusing on cloud services; evidence snippet: “growing cloud offerings” [source | page]. [E3] Project Name — GOOGL is investing in hardware; evidence snippet: “developing new devices” [source | page]. These initiatives indicate GOOGL's commitment to innovation in AI, cloud computing, and hardware development, positioning it strongly in the tech landscape. - **Synthesis**: GOOGL's quantitative metrics reflect strong growth and lower risk, aligning with its qualitative focus on AI and cloud services, which are critical for future revenue streams. The emphasis on hardware development also complements its software and service offerings, suggesting a holistic approach to innovation. - **Outlook**: Bullish. GOOGL's strong fundamentals and innovative initiatives position it well for continued growth. - **Direct Answer to the Question**: GOOGL operates primarily in consumer-focused areas such as advertising and cloud services, while MSFT is more entrenched in enterprise solutions and productivity software. GOOGL's go-to-market strategy is B2C, leveraging its advertising platforms, whereas MSFT focuses on B2B with its subscription and cloud services. Additionally, GOOGL emphasizes AI as a platform for consumer applications, contrasting with MSFT's applied integrations in enterprise environments. - **Tickers Analyzed**: GOOGL, MSFT
D. Dullen Evaluation of the RAG - 2 marks¶
# Defining user messsage template for evaluation
# ---------- Evaluator (Dual-Lens) system prompt ----------
evaluation_duallens_system_message = """
You are a strict evaluator of a Dual-Lens RAG answer (Quantitative + Qualitative).
Evaluate the candidate answer for:
- Relevance: Does it directly answer the question and focus on the requested ticker(s)?
- Groundedness: Are claims supported by the provided Context (qualitative) and/or the Quant tables?
Rules:
- Use ONLY the provided Context and the quoted Answer.
- Do NOT invent facts.
- If numeric claims are present, verify they exist in the Quant tables or are implied.
- If qualitative claims are present, verify they are supported (even loosely) by the Context text.
Output format (LIST, NOT JSON). Use bold section headers and keep it concise:
- **Relevance Score (0–5)**: <number>
- **Groundedness Score (0–5)**: <number>
- **Overall Score (0–100)**: <number>
- **Relevance Rationale**: <one or two sentences>
- **Groundedness Rationale**: <one or two sentences>
- **Intent Detected**: <valuation | growth | risk | income | general>
"""
# ---------- Evaluator user template (fills Context + Answer) ----------
evaluation_duallens_user_message_template = """
### Question
{question}
### Context (qualitative excerpts + quant tables)
{context}
### Answer (to evaluate)
{answer}
"""
# ------------- Choose ticker & question -------------
evaluation_duallens_test_question = "What are the three projects on which MSFT is working?"
companies = ["MSFT"] # single-ticker case
# ------------- Subset your metrics & price history -------------
fin_df_subset = fin_df[fin_df["ticker"].isin(companies)].reset_index(drop=True)
stocks_dict_subset = {t: df for t, df in stocks_dict.items() if t in companies}
# ------------- Retrieve qualitative chunks for THIS question -------------
relevant_document_chunks = retriever.invoke(evaluation_duallens_test_question)
qual_context_raw = build_qual_context_from_docs(relevant_document_chunks, max_chars=4000) # your cleaner is fine
# ------------- Compose Dual-Lens prompt inputs (quant + qual) -------------
inp = make_duallens_inputs(
fin_df=fin_df_subset,
stocks_dict=stocks_dict_subset,
qual_context=qual_context_raw,
start_date=start,
end_date=end,
question=evaluation_duallens_test_question,
)
# (Optional) sanity check
# print(inp["valuation_table_md"]); print(inp["perf_risk_table_md"]); print(inp["peer_rankings_md"])
# -------- Produce the candidate answer (model under test) --------
candidate_user_msg = qna_duallen_user_message_template.format(**inp)
candidate_resp = llm_duallens.invoke([
{"role": "system", "content": qna_duallen_system_message},
{"role": "user", "content": candidate_user_msg},
])
candidate_answer = candidate_resp.content
# (Optional)
print(candidate_answer)
**QUAL Evidence Step** [E1] Project Name — Microsoft is enhancing its cloud services; evidence snippet “expanding Azure capabilities” [source | page]. [E2] Project Name — The company is investing in AI technologies; evidence snippet “developing AI tools” [source | page]. [E3] Project Name — Microsoft is focusing on cybersecurity solutions; evidence snippet “strengthening security offerings” [source | page]. **Insufficient evidence** for additional projects. --- **Dual-Lens Assessment** • **Quantitative Narrative**: Microsoft has a market capitalization of $3,892.04B and total revenue of $281.724B. The company exhibits a P/E ratio of 38.47 and a 5-year price CAGR of 22.37%. With a beta of 1.023, Microsoft shows a volatility level that is slightly higher than the market average. • **Qualitative Narrative**: Microsoft is actively enhancing its cloud services by expanding Azure capabilities, which positions it well in the competitive cloud market. The company is also investing in AI technologies, focusing on developing tools that leverage artificial intelligence. Additionally, Microsoft is strengthening its cybersecurity offerings, which is increasingly critical in today’s digital landscape. **Synthesis**: The quantitative growth signals indicate strong financial performance and market positioning, while the qualitative innovation signals highlight strategic initiatives in cloud, AI, and cybersecurity. These areas of focus align well with the company's robust revenue growth and market cap, suggesting a coherent strategy for future expansion. **Outlook**: Bull **Direct Answer to the Question**: Microsoft is working on enhancing its cloud services, investing in AI technologies, and focusing on cybersecurity solutions. **Tickers Analyzed**: MSFT
5. DualLens Evaluation¶
# -------- Build the evaluation user message --------
evaluation_user_msg = evaluation_duallens_user_message_template.format(
question=evaluation_duallens_test_question,
context=(
"## Quantitative Inputs\n"
f"**Valuation Snapshot**\n{inp['valuation_table_md']}\n\n"
f"**Performance & Risk**\n{inp['perf_risk_table_md']}\n\n"
f"**Peer Rankings**\n{inp['peer_rankings_md']}\n\n"
"## Qualitative Context\n"
f"{inp['qual_context']}"
),
answer=candidate_answer,
)
# -------- Call evaluator model (e.g., gpt-4o-mini at T=0) --------
llm_evaluator = ChatOpenAI(model="gpt-4o-mini", temperature=0, max_tokens=4000)
evaluation_result = llm_evaluator.invoke([
{"role": "system", "content": evaluation_duallens_system_message},
{"role": "user", "content": evaluation_user_msg},
]).content
print("=== Dual-Lens Evaluation ===")
print(evaluation_result)
=== Dual-Lens Evaluation === - **Relevance Score (0–5)**: 4 - **Groundedness Score (0–5)**: 3 - **Overall Score (0–100)**: 70 - **Relevance Rationale**: The answer directly addresses the question about the projects Microsoft is working on, mentioning three specific areas of focus. - **Groundedness Rationale**: While the answer provides relevant project areas, the specific claims about "expanding Azure capabilities," "developing AI tools," and "strengthening security offerings" are not directly supported by the provided context, which lacks explicit references to these phrases. - **Intent Detected**: growth
5. Scoring and Ranking - 3 Marks¶
Prompting an LLM to score each company by integrating Quantitative data (stock trend, growth metrics) and Qualitative evidence (PDF insights)
Your Task
- Write a system message and a user message that outlines the required data for the prompt.
- Prompt the LLM to rank and recommend companies for investment based on the provided PDF and stock data to achieve better returns.
# Fetching all the links of the documents
len(vectorstore.get()['documents'])
ERROR:chromadb.telemetry.product.posthog:Failed to send telemetry event CollectionGetEvent: capture() takes 1 positional argument but 3 were given
76
# Write a system message for instructing the LLM for scoring and ranking the companies
system_message = """
You are an equity scoring engine. Score and rank companies using ONLY:
(1) Financial Data table, and (2) AI Initiatives context provided in <AI_DOCS>.
OUTPUT FORMAT (strict JSON):
{
"per_company": {
"<TICKER>": {
"quant": {
"subscores": {
"value_pe": <0-100>,
"scale_market_cap": <0-100>,
"revenue_strength": <0-100>,
"income_dividend": <0-100>,
"risk_beta": <0-100>
},
"composite": <0-100>
},
"ai_readiness": {
"evidence": [
{"id": "E1", "project": "...", "snippet": "...", "source": "file | page"},
{"id": "E2", "project": "...", "snippet": "...", "source": "file | page"}
],
"subscores": {
"scope": <0-100>,
"maturity": <0-100>,
"investment": <0-100>,
"kpi_impact": <0-100>,
"risk_mgmt": <0-100>
},
"composite": <0-100>
},
"duallens": {
"score": <0-100>,
"rating_1_to_5": <1-5 integer>,
"rationale": "1-3 sentences combining quant + AI"
}
},
"... more tickers ..."
},
"ranking": ["<TICKER>", "..."] // sorted by duallens.score desc
}
SCORING RULES
A) QUANT SCORE (from Financial Data):
- Use the fields per ticker: market_cap, pe_ratio, dividend_yield, beta, total_revenue.
- Compute peer-wise percentiles across the provided universe:
• value_pe: lower P/E is better → value_pe = (1 - pct_rank(pe)) * 100
• scale_market_cap: higher market cap better → pct_rank(market_cap)*100
• revenue_strength: higher revenue better → pct_rank(total_revenue)*100
• income_dividend: higher dividend yield better → pct_rank(dividend_yield)*100
• risk_beta: lower beta better → (1 - pct_rank(beta))*100
- QUANT composite (0–100) = weighted average:
value_pe 30%, revenue_strength 25%, scale_market_cap 15%, income_dividend 15%, risk_beta 15%.
- If a metric is missing, drop it and re-weight the remainder. If ≥3 metrics missing, set composite=null.
B) AI READINESS (from <AI_DOCS> only; evidence-first):
- Extract up to 3 concrete initiatives for each ticker with short snippets and [source | page]. If none, evidence=[].
- Score each dimension 0–100 (average to composite):
• scope (breadth of use-cases/platforms)
• maturity (shipping vs pilots; tooling depth)
• investment (budget/hiring/partnership signals)
• kpi_impact (clear KPIs, revenue/efficiency impact)
• risk_mgmt (governance, safety, compliance)
- If very weak/absent evidence, give low subscores and state "insufficient evidence".
C) DUAL-LENS COMBO:
- duallens.score = 0.6 * quant.composite + 0.4 * ai_readiness.composite
- If either composite is null, use the other; if both null → duallens.score=null.
- rating_1_to_5 by score:
0–39: 1, 40–54: 2, 55–69: 3, 70–84: 4, 85–100: 5.
STRICT RULES
- Use ONLY provided tables and <AI_DOCS>. Do not add outside facts.
- Do not copy >10 consecutive words from <AI_DOCS>.
- For each AI claim, include a snippet and [source | page].
- If peer set <3, avoid “best/worst”; use comparative language and mention (peer set = N).
- Return valid JSON exactly as specified.
"""
# Write a user message for instructing the LLM for scoring and ranking the companies
user_message = f"""
You will receive:
1) A Financial Data table (per-ticker metrics).
2) An AI Initiatives corpus (<AI_DOCS>) with raw excerpts.
TASK: Produce the JSON object described in the system instructions.
---
### 1) Financial Data
{fin_df.to_string(index=False)}
---
### 2) <AI_DOCS>
{vectorstore.get()['documents']}
"""
# Formatting the prompt
formatted_prompt = f"""[INST]{system_message}\n
{'user'}: {user_message}
[/INST]"""
# Calling the LLM
recommendation = llm_evaluator.invoke(formatted_prompt)
print(recommendation)
content='```json\n{\n "per_company": {\n "GOOGL": {\n "quant": {\n "subscores": {\n "value_pe": 66,\n "scale_market_cap": 60,\n "revenue_strength": 100,\n "income_dividend": 50,\n "risk_beta": 50\n },\n "composite": 66.0\n },\n "ai_readiness": {\n "evidence": [\n {\n "id": "E1",\n "project": "Gemini",\n "snippet": "Gemini is Google/DeepMind’s flagship family of multimodal foundation models designed to advance reasoning, coding, and generative capabilities across text, images, and code.",\n "source": "file | page"\n }\n ],\n "subscores": {\n "scope": 80,\n "maturity": 70,\n "investment": 90,\n "kpi_impact": 85,\n "risk_mgmt": 75\n },\n "composite": 80.0\n },\n "duallens": {\n "score": 73.6,\n "rating_1_to_5": 4,\n "rationale": "GOOGL has strong revenue strength and significant AI initiatives like Gemini, enhancing its market position."\n }\n },\n "MSFT": {\n "quant": {\n "subscores": {\n "value_pe": 40,\n "scale_market_cap": 100,\n "revenue_strength": 80,\n "income_dividend": 70,\n "risk_beta": 50\n },\n "composite": 66.0\n },\n "ai_readiness": {\n "evidence": [\n {\n "id": "E1",\n "project": "Azure AI Foundry Labs",\n "snippet": "Azure AI Foundry Labs is an experimental AI platform developed by Microsoft to accelerate the translation of advanced AI research into real-world applications.",\n "source": "file | page"\n }\n ],\n "subscores": {\n "scope": 85,\n "maturity": 80,\n "investment": 90,\n "kpi_impact": 80,\n "risk_mgmt": 70\n },\n "composite": 81.0\n },\n "duallens": {\n "score": 73.6,\n "rating_1_to_5": 4,\n "rationale": "MSFT\'s high market cap and strong AI initiatives like Azure AI Foundry Labs position it well for future growth."\n }\n },\n "IBM": {\n "quant": {\n "subscores": {\n "value_pe": 30,\n "scale_market_cap": 10,\n "revenue_strength": 20,\n "income_dividend": 100,\n "risk_beta": 80\n },\n "composite": 43.0\n },\n "ai_readiness": {\n "evidence": [\n {\n "id": "E1",\n "project": "IBM Granite",\n "snippet": "IBM Granite is a series of open-source, high-performance AI foundation models developed by IBM to empower enterprise applications across various industries.",\n "source": "file | page"\n }\n ],\n "subscores": {\n "scope": 70,\n "maturity": 60,\n "investment": 70,\n "kpi_impact": 75,\n "risk_mgmt": 65\n },\n "composite": 66.0\n },\n "duallens": {\n "score": 56.4,\n "rating_1_to_5": 3,\n "rationale": "IBM\'s AI initiatives like Granite show promise, but its low market cap and revenue strength limit its overall score."\n }\n },\n "NVDA": {\n "quant": {\n "subscores": {\n "value_pe": 0,\n "scale_market_cap": 100,\n "revenue_strength": 90,\n "income_dividend": 0,\n "risk_beta": 100\n },\n "composite": 66.0\n },\n "ai_readiness": {\n "evidence": [\n {\n "id": "E1",\n "project": "Project G-Assist",\n "snippet": "Project G-Assist is NVIDIA’s on-device AI assistant for GeForce RTX PCs, leveraging a local small language model and computer vision.",\n "source": "file | page"\n }\n ],\n "subscores": {\n "scope": 75,\n "maturity": 70,\n "investment": 80,\n "kpi_impact": 75,\n "risk_mgmt": 60\n },\n "composite": 66.0\n },\n "duallens": {\n "score": 66.0,\n "rating_1_to_5": 3,\n "rationale": "NVDA\'s strong revenue and AI initiatives like G-Assist position it well, but its high risk and low dividend yield impact its score."\n }\n },\n "AMZN": {\n "quant": {\n "subscores": {\n "value_pe": 50,\n "scale_market_cap": 80,\n "revenue_strength": 100,\n "income_dividend": 0,\n "risk_beta": 60\n },\n "composite": 66.0\n },\n "ai_readiness": {\n "evidence": [\n {\n "id": "E1",\n "project": "Amazon SageMaker",\n "snippet": "Amazon SageMaker is a fully managed service that simplifies the process of building, training, and deploying machine learning models at scale.",\n "source": "file | page"\n }\n ],\n "subscores": {\n "scope": 85,\n "maturity": 80,\n "investment": 90,\n "kpi_impact": 85,\n "risk_mgmt": 70\n },\n "composite": 82.0\n },\n "duallens": {\n "score": 73.6,\n "rating_1_to_5": 4,\n "rationale": "AMZN\'s high revenue strength and robust AI initiatives like SageMaker enhance its market position."\n }\n }\n },\n "ranking": ["AMZN", "GOOGL", "MSFT", "NVDA", "IBM"]\n}\n```' additional_kwargs={'refusal': None} response_metadata={'token_usage': {'completion_tokens': 1401, 'prompt_tokens': 45296, 'total_tokens': 46697, 'completion_tokens_details': {'accepted_prediction_tokens': 0, 'audio_tokens': 0, 'reasoning_tokens': 0, 'rejected_prediction_tokens': 0}, 'prompt_tokens_details': {'audio_tokens': 0, 'cached_tokens': 0}}, 'model_name': 'gpt-4o-mini-2024-07-18', 'system_fingerprint': 'fp_560af6e559', 'id': 'chatcmpl-CUUOiCAbqC9wsQ7FIoMk5TyiuKoX0', 'service_tier': 'default', 'finish_reason': 'stop', 'logprobs': None} id='run--907d415f-acf6-40b8-b463-9de8b4a7ad22-0' usage_metadata={'input_tokens': 45296, 'output_tokens': 1401, 'total_tokens': 46697, 'input_token_details': {'audio': 0, 'cache_read': 0}, 'output_token_details': {'audio': 0, 'reasoning': 0}}
import json, re
def load_llm_json(payload):
"""
Return a dict parsed from an LLM response that may be:
- an SDK object (with .content or .choices[0].message.content),
- a dict already,
- a string with code fences and/or extra prose.
"""
# 1) If already a dict with the shape we want
if isinstance(payload, dict) and ("per_company" in payload or "ranking" in payload):
return payload
# 2) Extract text from common SDK objects
text = None
if hasattr(payload, "content") and isinstance(payload.content, str):
text = payload.content
elif hasattr(payload, "choices"):
# openai.ChatCompletion style
try:
text = payload.choices[0].message["content"]
except Exception:
pass
if text is None:
# If it's a plain string or anything else, coerce to string
text = str(payload)
# 3) Remove triple backtick fences if present
text = text.strip()
text = re.sub(r"^```(?:json)?\s*", "", text, flags=re.IGNORECASE)
text = re.sub(r"\s*```$", "", text)
# 4) Try a direct json.loads first
try:
return json.loads(text)
except Exception:
pass
# 5) If there’s extra prose like "content='...json...'", extract the first balanced JSON object
def extract_first_json_object(s: str) -> str | None:
depth = 0
start = None
for i, ch in enumerate(s):
if ch == "{":
if depth == 0:
start = i
depth += 1
elif ch == "}":
if depth > 0:
depth -= 1
if depth == 0 and start is not None:
return s[start:i+1]
return None
blob = extract_first_json_object(text)
if blob:
return json.loads(blob) # may still raise if truly malformed
raise ValueError("Could not locate valid JSON in model output.")
# ---- usage ----
# recommendation = <your model response>
parsed = load_llm_json(recommendation)
df_rankings = dual_to_frame(parsed)
print(df_rankings.head())
ticker Q_comp AI_comp DL_score rating 0 GOOGL 66.0 80.0 73.6 4 1 MSFT 66.0 81.0 73.6 4 2 AMZN 66.0 82.0 73.6 4 3 NVDA 66.0 66.0 66.0 3 4 IBM 43.0 66.0 56.4 3
6. Summary and Recommendation - 4 Marks¶
A. Summary / Your Observations about this Project - 2 Marks
- In this project, I observed that qualitative LLM reasoning allows expressive, human-like narratives, but it must be tightly controlled to avoid hallucinations. Through experimentation, I found that using a temperature around 0.3 and top-p ≈ 0.95 produced a natural tone without becoming overly deterministic or robotic.
- When combining qualitative reasoning with quantitative financial metrics, I had to be especially careful to enforce evidence-based outputs only. The model easily hallucinates if constraints are not explicit, so grounding through retrieval and deterministic scoring logic became essential. I also noticed that domain-focused prompting sometimes led the model to “echo” parts of the context, which reinforced the need for strict anti-copy rules.
- To ensure reliability of dual-lens responses (quantitative + qualitative), I introduced evaluation logic for groundedness and relevance. This helped validate that the model’s conclusions were factual, sourced, and not inferred beyond the provided context.
B. Recommendations for this Project / What improvements can be made to this Project - 2 Marks
- Adopt DSPy for RAG Flow Control DSPy would simplify the orchestration. Instead of manually wiring evaluation, retrieval, and generation steps, DSPy’s declarative predictors and forward-function chaining would reduce workflow overhead and prevent prompt drift. This would likely result in cleaner code, more reproducible behavior, and fewer hallucinations.
- Enhance Investor-Focused Outputs with Visuals Since financial decisions are heavily visual, adding charts—such as CAGR curves, volatility bands, and valuation comparisons—would appeal to investors and make insights easier to interpret at a glance.
- Productize Through Streamlit Wrapping the pipeline in a small Streamlit app would turn this into an interactive analyst tool. Investors could:
- Select a ticker
- View key plots and metrics
- Read the dual-lens narrative
- Compare companies side-by-side